Caguas
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Kim, Hyunjae, Sohn, Jiwoong, Gilson, Aidan, Cochran-Caggiano, Nicholas, Applebaum, Serina, Jin, Heeju, Park, Seihee, Park, Yujin, Park, Jiyeong, Choi, Seoyoung, Contreras, Brittany Alexandra Herrera, Huang, Thomas, Yun, Jaehoon, Wei, Ethan F., Jiang, Roy, Colucci, Leah, Lai, Eric, Dave, Amisha, Guo, Tuo, Singer, Maxwell B., Koo, Yonghoe, Adelman, Ron A., Zou, James, Taylor, Andrew, Cohan, Arman, Xu, Hua, Chen, Qingyu
Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, evidence-grounded reasoning. Retrieval-augmented generation (RAG) has been widely adopted to address these limitations by supplementing model outputs with retrieved evidence. However, whether RAG reliably achieves these goals remains unclear. Here, we present the most comprehensive expert evaluation of RAG in medicine to date. Eighteen medical experts contributed a total of 80,502 annotations, assessing 800 model outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and USMLE-style queries. We systematically decomposed the RAG pipeline into three components: (i) evidence retrieval (relevance of retrieved passages), (ii) evidence selection (accuracy of evidence usage), and (iii) response generation (factuality and completeness of outputs). Contrary to expectation, standard RAG often degraded performance: only 22% of top-16 passages were relevant, evidence selection remained weak (precision 41-43%, recall 27-49%), and factuality and completeness dropped by up to 6% and 5%, respectively, compared with non-RAG variants. Retrieval and evidence selection remain key failure points for the model, contributing to the overall performance drop. We further show that simple yet effective strategies, including evidence filtering and query reformulation, substantially mitigate these issues, improving performance on MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call for re-examining RAG's role in medicine and highlight the importance of stage-aware evaluation and deliberate system design for reliable medical LLM applications.
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
The Robustness of Structural Features in Species Interaction Networks
Fard, Sanaz Hasanzadeh, Dolson, Emily
Species interaction networks are a powerful tool for describing ecological communities; they typically contain nodes representing species, and edges representing interactions between those species. For the purposes of drawing abstract inferences about groups of similar networks, ecologists often use graph topology metrics to summarize structural features. However, gathering the data that underlies these networks is challenging, which can lead to some interactions being missed. Thus, it is important to understand how much different structural metrics are affected by missing data. To address this question, we analyzed a database of 148 real-world bipartite networks representing four different types of species interactions (pollination, host-parasite, plant-ant, and seed-dispersal). For each network, we measured six different topological properties: number of connected components, variance in node betweenness, variance in node PageRank, largest Eigenvalue, the number of non-zero Eigenvalues, and community detection as determined by four different algorithms. We then tested how these properties change as additional edges -- representing data that may have been missed -- are added to the networks. We found substantial variation in how robust different properties were to the missing data. For example, the Clauset-Newman-Moore and Louvain community detection algorithms showed much more gradual change as edges were added than the label propagation and Girvan-Newman algorithms did, suggesting that the former are more robust. Robustness also varied for some metrics based on interaction type. These results provide a foundation for selecting network properties to use when analyzing messy ecological network data.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- Oceania > New Zealand (0.04)
- North America > United States > Michigan (0.04)
- (25 more...)
- Telecommunications > Networks (0.34)
- Information Technology > Networks (0.34)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.93)
- Information Technology > Communications > Networks (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
SentRNA: Improving computational RNA design by incorporating a prior of human design strategies
Shi, Jade, Das, Rhiju, Pande, Vijay S.
SentRNA: Improving computational RNA design by incorporating a prior of human design strategies Jade Shi, EteRNA players, Rhiju Das, and Vijay S. Pande Abstract: Designing RNA sequences that fold into specific structures and perform desired biological functions is an emerging field in bioengineering with broad applications from intracellular chemical catalysis to cancer therapy via selective gene silencing. Effective RNA design requires first solving the inverse folding problem: given a target structure, propose a sequence that folds into that structure. Although significant progress has been made in developing computational algorithms for this purpose, current approaches are ineffective at designing sequences for complex targets, limiting their utility in real-world applications. However, an alternative that has shown significantly higher performance are human players of the online RNA design game EteRNA. Through many rounds of gameplay, these players have developed a collective library of "human" rules and strategies for RNA design that have proven to be more effective than current computational approaches, especially for complex targets. Here, we present an RNA design agent, SentRNA, which consists of a fully-connected neural network trained using the eternasolves dataset, a set of 1.8 x 10 The agent first predicts an initial sequence for a target using the trained network, and then refines that solution if necessary using a short adaptive walk utilizing a canon of standard design moves. Through this approach, we observe SentRNA can learn and apply humanlike design strategies to solve several complex targets previously unsolvable by any computational approach. We thus demonstrate that incorporating a prior of human design strategies into a computational agent can significantly boost its performance, and suggests a new paradigm for machine-based RNA design. Introduction: Solving the inverse folding problem for RNA is a critical prerequisite to effective RNA design, an emerging field of modern bioengineering research. A RNA molecule's function is highly dependent on the structure into which it folds, which in turn is determined by the sequence of nucleotides that comprise it. Therefore, designing RNA molecules to perform specific functions requires designing sequences that fold into specific structures. As such, significant efforts have been made over the past several decades in developing computational algorithms to reliably predict RNA sequences that fold into a given target. Existing computational methods for inverse RNA folding can be roughly separated into two types. The first type generates an initial guess of a sequence and then refines the sequence using some form of stochastic search.
- Europe > Austria > Vienna (0.04)
- North America > United States > Indiana (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (6 more...)